Using Context-sensitive Statistics to Rank Documents

نویسندگان

  • Liang Jeff Chen
  • Yannis Papakonstantinou
چکیده

We study the problem of context-sensitive ranking for document retrieval, where a context is defined as a sub-collection of documents, and is specified by queries provided by domain-interested users. The motivation of context-sensitive search is that the ranking of the same keyword query generally depends significantly on the context. The underlying reason is that the underlying keyword statistics differ significantly. The query evaluation challenge is the computation of keyword statistics at run time, which involves expensive online aggregations. We appropriately leverage and extend materialized view research in order to deliver algorithms and data structures that evaluate contextsensitive queries efficiently. Specifically, a number of views are selected and materialized, each corresponding to one or more large contexts. Materialized views are used at query time to compute statistics which are used to compute ranking scores. Experimental results show that the context-sensitive ranking generally improves the ranking quality, while our materialized view-based technique improves the query efficiency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک رتبه‌بند برای خطایاب معنایی با استفاده از ویژگی‌های حساس به متن

Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...

متن کامل

Learning to Rank using Query-Level Rules

Most existing learning to rank methods neglect query-sensitive information while producing functions to estimate the relevance of documents (i.e., all examples in the training data are treated indistinctly, no matter the query associated with them). This is counter-intuitive, since the relevance of a document depends on the query context (i.e., the same document may have different relevances, d...

متن کامل

Relative Rank Statistics for Dialog Analysis

We introduce the relative rank differential statistic which is a non-parametric approach to document and dialog analysis based on word frequency rank-statistics. We also present a simple method to establish semantic saliency in dialog, documents, and dialog segments using these word frequency rank statistics. Applications of our technique include the dynamic tracking of topic and semantic evolu...

متن کامل

Information Routing Using a Corpus Distribution

The research goal of information routing (IR) is to retrieve and rank a collection of text documents that coincide with a user profile (Harman 1995). Ideally, the profile can be derived automatically from a set of documents the user has identified as relevant to a particular topic of interest. The assumption for this work is a user has provided this small set of documents. It is then our goal t...

متن کامل

Towards a context sensitive approach to searching information based on domain specific knowledge sources

In the context of document retrieval in the biomedical domain, this paper introduces a novel approach to searching for biomedical information using contextual semantic information. More specifically, we propose to combine the contextual semantic information in documents and user queries in an attempt to improve the performance of biomedical information retrieval (IR) systems. Contextual informa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010